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Abstract 

We develop a new approach to vector quantization, which guarantees an intrinsic station- 
arity property that also holds, in contrast to regular quantization, for non-optimal quantiza- 
tion grids. This goal is achieved by replacing the usual nearest neighbor projection operator 
for Voronoi quantization by a random splitting operator, which maps the random source to 
the vertices of a triangle of d-simplex. In the quadratic Euclidean case, it is shown that these 
triangles or d-simplices make up a Delaunay triangulation of the underlying grid. 

Furthermore, we prove the existence of an optimal grid for this Delaunay - or dual - 
quantization procedure. We also provide a stochastic optimization method to compute such 
optimal grids, here for higher dimensional uniform and normal distributions. A crucial fea- 
ture of this new approach is the fact that it automatically leads to a second order quadrature 
formula for computing expectations, regardless of the optimality of the underlying grid. 

Keywords: Quantization, Stationarity, Voronoi tessellation, Delaunay triangulation, Numerical 
integration. 

MSC 2010: 60F25, 65C50, 65D32 

1 Introduction and motivation 

Quantization of random variables aims at finding the best p-th mean approximation to a random 
vector (r.v.) X : (0,<5,P) -)■ (R d ,B d ) and R d equipped with a norm ||-|[. That means, for 
X 6 LjjL(P), p > 0, that we have to minimize 

Emin[|A-ir|| p (1) 

over all finite grids T C K d of a given size (the term grid is a convenient synonym for nonempty 
finite subset of This problem has its origin in the fields of signal processing in the late 1940s. 
A mathematically rigorous and comprehensive exposition of this topic can be found in the book 
of Graf and Luschgy [7]. 

Using the nearest neighbor projection, we are able to construct a random variable X r , which 
achieves the minimum in ([1]). Such an approximation, which is called Voronoi quantization, has 
been successfully applied to various problems in applied probability theory and mathematical 
finance, e.g. multi-asset American/Bermudan style options pricing and J-hedging (see [TJ [5]), 
swing options, supply gas contract, on energy markets (Stochastic control) (see [3JIHGII), nonlinear 
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filtering method for stochastic volatility estimation (see [TU1 [IH EH [H] ) , discretization of SPDE's 
(stochastic Zakai and McKean-Vlasov equations) (see [6]). 

Especially we may use optimal quantizations to establish numerical cubature formulas, i.e. to 
approximate EF(X) by 

EF(X r ) = Y,^-F(x), 

where w x = P(X r = x). 

Such a cubature formula is known to be optimal in the class of Lipschitz functionals and it holds 
for a Lipschitz functional F (with Lipschitz ratio [F ]ijp) 

\EF(X) - EF(X r )\ < [F] Up E\\X - X F \\. (2) 

If F exhibits a bit more smoothness, i.e. is differentiablc with Lipschitz continuous differential 
F' and X T fulfills the so-called stationarity property 

E(X\X r )=X r , (3) 

we can derive by means of a Taylor expansion the second order rate 

\EF(X) - EF(X r )\ < [F'] Up E\\X - X r \\ 2 . 

Unfortunately, the stationarity property for the Voronoi quantization X r is a rather fragile object, 
since it only holds for grids L which are especially tailored and optimized for the distribution of 
X. 

That means, that if a grid T, which has been originally constructed and optimized for X, is 
employed to approximate a r.v. Y which only slightly differs from X, then P might be still an 
arbitrary good quantization for Y, i.e. E\\Y — Y r \\ p is very close to the optimal quantization 
error, but the stationarity property ([3]) is in general violated. Thus, only the first order bound 
@ is in this case valid for a cubature formula based on a Voronoi quantization of Y. 
In this paper, we look for an alternative to the nearest neighbor projection operator and the 
Voronoi quantization, which will be capable of preserving some stationarity property in the 
above setting. In order to achieve this, we pass on to a product space (fio x f2,«So <£> <£> P) 
and introduce a random splitting operator Jy : Qo x M. d — > T, which satisfies 

E(J T (Y)\Y)=Y 

for any Revalued r.v. Y defined on (f2,iS,P) such that supp(Py) C conv(r) where supp(Py) and 
conv(r) denote the support of the distribution Fy and the convex hull of L respectively. Note 
that this implies that Y is compactly supported. As a matter of facts, such an operator fulfills 
the so-called intrinsic stationarity property 

E(J r (0)=L eeconv(r). (4) 

Although this stationarity differs from the one defined above, one may again derive a second 
order error bound for a diffcrcntiable function F with Lipschitz derivative 

\EF(Y) - EF(J r (Y))\ < [F'] Lip E||Y - Jr(^)|| 2 

which now holds for any r.v. Y regardless of the grid L, except satisfying supp(Py) C conv(P). 

On our way, we will make the connection with functional approximation by noting that the 
functional operator related to J7r defined by 

Jr(F) := (£■— ►E Po f(Jr(wo,£))) 
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is in standard situations a (classical) continuous piecewise affine interpolation approximation of 
F. 

One may naturally ask at this stage for the best possible approximation power of J7r(^0 to X, 
i.e. minimize the p-th power mean error 

E\\X - J r (X)\\* 

over all grids of size not exceeding n and all random operators Jr fulfilling the intrinsic station- 
arity property 

This means, that we will deal for n G N with the mean error modulus 



<F n {X) = inf{E||X - Jt(X)\\ p :T C R d , \T\ < n, supp(P x ) C conv(r), 

Jy : SIq x K d r intrinsic stationary} 



(5) 



where |T| denotes the cardinality of T. 

It will turn out in Section [2] that the problem of finding an optimal random operator J7r for a 
grid r = {x\, . . . ,Xk}, k < n, is equivalent to solving the Linear Programming problem 



min > A,- 1 \X (o 



Afc 



(0) 



; = 1 



where [?;;;?] A 



. ;;; ^ ]a= , A>o 

Defining the local dual quantization function as 



k 



At 



Ai ||^ ■ 



i ... i 



A= J , A>0 



we will show that 



<F n (X) = inf{EF p (X;r) : T C R d , \T\ < n). 



(7) 



This means, that the dual quantization problem actually consists of two phases: during the first 
one we have to locally solve the optimization problem whereas phase two, which consists 
of the global optimization over all possible grids in ([7]), is the more involved problem. It is 
highly non-linear and contains a probabilistic component by contrast to phase one which can be 
considered more or less as deterministic. 

Moreover, we will see in section [3] that the solution to the Linear Programming ([6]) is in the 
quadratic Euclidean case completely determined by the Delaunay triangulation spanned by T 
and this structure is, in the graph theoretic sense, the dual counterpart of the Voronoi diagram, 
on which regular quantization is based. That is actually also the reason, why we call this new 
approach dual or Delaunay quantization. 

In section [21 we propose an extension of the dual quantization idea to non-compactly supported 
random variables. For those and the compactly supported r.v.'s we prove the existence of optimal 
quantizers in section [U i.e. the fact, that there are sets T, which actually achieve the infimum 
in ^ . Finally, in section [SJ we give numerical illustrations of some optimal dual quantizers and 
numerical procedures to generate them. 

In a companion paper |12j . we establish the counterpart of the celebrated Zador theorem for 
regular vector quantization: namely we elucidate the sharp rate for the mean dual quantization 
error modulus defined in section [2] below. 

We also provide in [12j a non-asymptotic version of this theorem, which corresponds to the Pierce 
Lemma. 
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First numerical applications of dual quantization to Finance have been developed in a second 
companion paper [13] , especially for the pricing of American style derivatives like Bermuda and 
swing options. 

Notation: • u T will denote the transpose of the column vector u G R d . 

• Let u = (ui, . . . , Ud) 6 l d , we write u > (resp. > 0) if Ui > (resp > 0), i = 1, . . . , d. 

• A d := {x = (x°, . . . , x d ) G x° -\ h x d = 1} denotes the canonical simplex of R d+1 . 

• (xo, r) is the closed ball of center xq£ M. d and radius r > in (M. d , ||.||). 

• rk(M) denotes the rank of the matrix M. 

• 1a denotes the indicator function of the set A, \A\ its cardinality. 

• If A C E, E E- vector space, span A denotes the sub- vector space spanned by A. 

• Let (A n ) n >i be a sequence of sets: limsup n A n := n n L>k>n Ak and liminf„ A n := U„ Plfc> n Ak- 

• X d denotes the Lebesgue measure on (K rf ,Q5(R d )) (Borel a- field). 

2 Dual quantization and intrinsic stationarity 

First, we briefly recall the definition of the "regular" vector quantization problem for a r.v. 
X : (fi,S,P) -> and R d equipped with a norm ||-||. 

Definition 1. Let X G L^ d (P) for some p G [1, +oo). 

1. We define the (regular) LP-mean quantization error for a grid T = {xi, . . . , x n } C M. d as 

e p (X;T)=\\ mm \\X-x l \\\\ LP = (E mm \\X - Xi\\*) 1/p , 

l<i<k ^ l<i<n 

2. The optimal regular quantization error, which can be achieved by a grid T of size not ex- 
ceeding n G N, is given by 

e n , p (X) = inf {e p (X; T):Tc K d , |r| < n}. 

Remark. Since we will frequently consider the p-th power of e p (X; T) and e n , p {X), we will drop 
a duplicate index p and write, e.g. e^(X) instead of e^ p (X). 

It can be shown, that (at least) one optimal quantizer actually exists, i.e. for every n G N there 
is a grid r C R d with |r| < n such that 

e p (X; T) = e UiP {X). 

Moreover, this definition of the optimal quantization error is in fact equivalent to defining e^(X) 
as the best approximation error which can be achieved by a Borel transformation or by a discrete 
r.v. X taking at most n values. 

Proposition 1. Let X G L^ d (P), neN. Then 

eP(X) = inf{E||X - f{X)\\ v : f :R d ^ M Borel measurable, |/(M d )| < n) 
= inf [W\\X - X\\ p :X is a r.v. with \X(Q)\ < n}. 

The proof of this proposition is based on the construction of a Voronoi quantization of a r.v. by 
means of the nearest neighbour projection. 

Therefore, let T = {xi, . . . ,x n } C R d be a grid and denote by (C,(r))i<j<„ a Borel partition of 
M. d satisfying 

C i F)c{teR d :U-x i \\< min 1^-^11}. 

1 < i < n. 
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Such a partition is called a Voronoi partition generated by T and we may define the corresponding 
nearest neighbour projection as 

l<i<n 

The discrete r.v. 

J? r > Vor = 7r r (X) = x ^cdT){X) 

l<i<n 

is called Voronoi Quantization induced by T and satisfies 

eP(X;T)=E\\X -Tr r (X)\\ p . 

As already mentioned in the introduction, the concept of stationarity plays an important role in 
the application of quantization. A quantization X is said to be stationary for the r.v. X , if it 
satisfies 

E(X\X) = X. (8) 

It is well known that in the quadratic Euclidean case, i.e. p = 2 and ||-|| is the Euclidean norm, 
any optimal quantization (a r.v. X with |^\T(0) | < n and E\\X — X\\ p — e^X),) fulfills this 
property (this is no longer true in the present form for p ^ 2 or non Eucidcan norm, see [5]). 
Moreover, this stationarity condition is equivalent to the first order optimality criterion of the 
optimization problem 

E min \\X — Xi\\ 2 — > min , 

l<i<n x 1 ,...,XneK. d 

i.e. the Voronoi quantization X r ' Vm of a grid T = {xi, . . . ,x n } C R d satisfies the stationarity 
property ([5]) for a r.v. X, whenever T is a zero of the first order derivative of the mapping 
(xi, . ..,x n ) h-> Emini<j< n ||X - Xl \\ 2 . 

By means of this stationarity property ((5J) , we can derive the following second order error bound 
for a cubature formula based on quantization. 

Proposition 2. Let X e Lj^(P) and assume that F £ C 1,1 (R d ) is differentiable with Lipschitz 
differential. If the quantization X r for a grid T = {x\, . . ■ , x n } = X r (il), neN satisfies 

E(X\X r ) = X r , 

then it holds for the cubature formula EF(X r ) — X)"=i ^(X r = Xi) ■ F(xi) 

\EF(X) - EF(X r )\ < [F'] Llp E\\X - X r \\ 2 . 
Proof. From a Taylor expansion we obtain for X = X r 

\F(X) - F(X) - F'(X)(X - X)\ < [F'] Up \\X - if, 
so that taking conditional expectations and applying Jensen's inequality yield 

\E(F(X)\X) - F(X) -E(F'(X)(X - X)\X)\ < [F'] Up E(\\X - X\\ 2 \X). 
The stationarity assumption then implies 

E(F'(X)(X - X)\X) = F'(X) E((X - X)\X) = 0, 
so that the assertion follows again from taking expectations and Jensen's inequality. □ 
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Unfortunately, the above stationarity is a rather fragile property, since it only holds for Voronoi 
quantizations, whose underlying grid is specifically optimized for the distribution of X. Thus, 
this stationarity will in general fail, as soon as we modify the underlying r.v. even only slightly. 
Nevertheless, there is a second way to derive the second order error bound of Proposition [2J 
Assume that X is a discrete r.v. satisfying a somewhat dual stationarity property 

E(X\X) = X. (9) 

In this case we can perform, as in the proof of Proposition [5J a Taylor expansion, but this time 
with respect to X. We then conclude from ((9]) 

E(F'(X)(X- X)\X) =0 

so that finally the same assertion will hold. 

As we will see later on, this stationarity condition will be intrinsically fulfilled by the dual 
quantization operator. Thus, this new approach will be be very robust with respect to changes 
in the underlying r.v.s, since it always preserves stationarity. 

2.1 Definition of dual quantization 

We define here the dual quantization error by means of the local dual quantization error F p , since, 
doing so, we are able to introduce dual quantization along the lines of regular quantization. The 
stationarity property ^ will then appear as characterizing property of the Delaunay quantization 
and the dual quantization operator, the counterpart of Voronoi quantization and the nearest 
neighbour projection. 

The equivalence of the following Definition [2] and ([5]) will be given in Theorem [2j which provides 
an analog statement for dual quantization to Proposition [1] 
Without loss of generality assume from here on that 

span(supp(P x )) = M d , 

i.e. J is a true d-dimensional random variable. Otherwise we would reduce d. In the definitions 
below, we use the usual convention inf{0} = +oo. 

Definition 2. Let X 6 Lg d (P) for some p £ [l,oo). 

(a) The local dual quantization error induced by a grid T = {x%, . . . ,x„ } c R d and £ e M d is 
defined by 

F p (Z;T) = inf J ( ^ A ; ||£ - x^' 1 ' : A, > and £ \ Xi = £, ^ A. t = 1 

l<i<n l<i<n l<z<n 

(b) The L p -mean dual quantization error for X induced by the grid T is then given by 
d p (X;T) = \\F P (X;T)\\ LP = (Einf{ ^ AJX-^p : A, > 0, ^ A^ = X, ^A, = l}) i/P . 

l<i<n l<i<n l<i<n 

(c) The optimal dual quantization error, which can be achieved by a grid T of size not exceeding 
n will be denoted by 

d n . p {X) = inf{d p (X;r) : r C m d , |r| < n). 

Remarks. • Note that, like in the case of regular (Voronoi) quantization, the optimal dual 
quantization error depends actually only on the distribution of X. 
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• Note that F p (£,T) > dist(£,r) and consequently d p (X,T) > e p (X,T). 

• In most cases we will deal with the p-th power of F p , d p and d riiP . To avoid duplicating indices, 
we will write F p , d p and d v n instead of Fg, d v p and d v np . 

Denoting T = {x\, . . . , x„}, we recognize that F p (£\ T) is given by the linear programming problem 

n 

AtK 

s .t. [*/::: a f]A=[f],A>o 

Clearly, we have -F p (£; r) > for every £ G M. d , T C K d , so that it follows from the constraints 

A > (10) 
that (|LP|) has a finite solution if and only if £ G conv(r). 

Proposition 3. (a) Letpd [1,+co) and assume supp(Px) is compact. Then d njP (X) < +oo if 
and only if n > d+ 1. 

(b) Letpe (1,+co). It holds 

{d p (X; ■ ) < +00} = {r C R d : conv(r) D supp(P x )}. 

Proof, (a) Let £0 G supp(Pjf) and i? > such that suppPx C -B«°°(£o,"f) (closed ball w.r.t. 
the l°°-norm). Note that [— -f , C — -§ 1 + RAj where denotes the canonical simplex. 
Consequently 

supp(Pjr) c^ -|l + i?A d = conv(r ), T = {£0 - R/2 + ReJ , j = 0, . . . ,d} 
where e° = and (e?)i<j<d denotes the canonical basis of M d . Consequently 

v^esupp(Px), F p (e ; r ) <S(T ) 

where 5(A) := sup x y&A \\x — y\\ denotes the diameter of A. More generally, for every grid T such 
that supp(Px) C conv(r), F p (£;T) < +00 for every £ G suppPx- 
Hence, for every n > |r | = d + 1, 

< P P0 < <*(r ). 

If n < ii, the convex hull of a grid T cannot contain supp(Px): if so it contains its convex hull 
conv(suppPx)) as well which is impossible since it has a nonempty interior whereas the dimension 
of conv(r) is at most n — 1-dimcnsional. 

(6) It follows from what precedes that d p (X;T) < +00 if conv(r) D supp(Px). Conversely, if 
conv(r) 7$ supp(Px), there exists £0 £ supp(Px) \ conv(F). Let e > such that B(£ ,£o) H 
conv(r) = 0. On b(^ ,s ), F p (- ,T) = +00 and F x (B(( ,£o)) > 0, hence d n , p (X;T) = +00. □ 

2.2 Preliminaries on the local dual quantization functional 

Before we deal in detail with the dual quantization error for random variables, we have to derive 
some basic properties for the local dual quantization error functional F p . 
To alleviate notations, we introduce throughout the paper the abbreviations 

U-xi\\ p ~ 

U-Xn\\ P _ 



Xi ■ 
1 



X, t ] 

■ 1 



A = 


e 




1 



A 
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at least whenever T and/or £ are fixed so that (|LP[) can be written as 



min A c. 

AgK fc , A\=b, A>0 

Moreover, for every set I C {1, ...,n}, A/ = [oijjjg/ will denote the submatrix of A which 
columns correspond to the indices in / and c/ = [csjjgj will denote the subvector of c which rows 
are determined by /. Finally, aff. dim(r) will denote the dimension of the affine manifold spanned 
by the grid T in R d . 

Since it follows from Proposition[3]that, for any grid V — {x±, . . . , x n } cM. d with aff. dim{r} < d. 
d p (X;T) = +00, we will restrict in the sequel to grids with aff . dim{r} = d or equivalently 
satsifying rk[ a: 1 1 "\ x { 1 ] = d+l. The following proposition is straightforward. 



Proposition 4. (see e.g. J2f, 



For every £ G conv(r), HLP]) has a solution A* G W l , which 



is an extremal point of the compact set of linear con straints {77JP so that rk( [ x { ] , i e {j | > 0}) 
are independent. Hence (by the incomplete basis theorem), there exists a fundamental basis 
/* C {1, . . . ,rt} 7 such that \I*\ = d + 1, the columns [ x ( ] , j £ /* are linearly independent and, 
after reordering the rows, 

[A/. 



A* 







where A/» = A Jt b. 



(11) 



(Saying that I* is a basis rather than [ x { ] , i G /* , is a convenient abuse of notation) . This means, 
that the columns of A* corresponding to I* are given by Aj}b, the remaining ones being equal 
to 0. 

Consequently, the linear programming problem (|LPj) always admits a solution A*, whose non- 
zero components correspond to at most d+l afhncly independent points xj in T, i.e. an optimal 
triangle in R 2 or a d-simplex in K d . 

Since the whole minimization problem can therefore be restricted to such triangles or d-simplices, 
we introduce the set of basis (or admissible indices) for a grid T = {x\, . . . , x n } C R d as 

X(r) = {Jc {l,...,n} : |/| = d+l and rk(A 7 ) =d+l}. 

Moreover, we denote the optimality region for a basis I G Z(T) by 



^ / (r) = |£GK' i :A} = J 47 1 [«] >0and ^A*||£ 



A useful reformulation of the above linear programming problem (LP) is given by its dual version 
(see e.g. [S], Theorem 3, p. 91). 

Proposition 5 (Duality). The dual problem of HLP]) reads 

n 



mm 



max u 1 £ + U2 

til GK d ,«26K 



' xl 1" 


[Si]< 


■iic-»iir- 






-ii«-i..ir. 



(DLP) 



max min { ||£ - Xi\\ p + u T (£, - xM. 



An important criterion to check, whether a triangle or a d-simplcx in Y is optimal, is given by 
the following characterization of optimality in Linear Programs (see e.g. [5], Theorem 3 and 
Remarks 6.4 and 6.5 that follow). 
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Proposition 6 (Optimality Conditions). Let V be a grid of M. d with aff. dirnT = d and let 
£g conv(r). 

(a) If a basis I G I(T) is primal feasible, i.e. 

A/ = A7 1 [{] >0, 

as well as dual feasible, i.e. 

for u = (Afr'a, 



A T u< 
then 



L|K-*„||» 



E A iiK- a *ii p =[?] T < 



jei 

Furthermore A/ and u are optimal for HLP]) resp. \DLP\l and I is called optimal basis. 

(b) Conversely, if I G is an optimal basis, which is additionally non- degenerate for It LP]) , i.e. 

if there exist A € R k and u € R d+1 such that A/ = Aj 1 [f] > 0, A T u <c and ^2 X jU ~ x j\\ P = 

jei 

[ | ] u, then it holds 

Aju = cj. 

Now we may derive the continuity of F p as a function of £ on conv(r). 

Theorem 1. Let T = {xi, . . . ,x n } C M. d , n £ N, be a fixed grid of size k. Then the function 
/r : conv(r) — > K defined by /r(0 = -F p (£;r) is continuous. 

Proof. The lower semi-continuity (l.s.c.) of /p follows directly from its dual representation 

/r(0= sup min - Xi \\r + U T {£ - Xi )} 

ue Rd l<i<n 

since the supremum of a family of continuous functions is l.s.c. 

To establish the upper semi-continuity, we proceed as follows. Let £, £ n £ conv(r) such that 
£™ — > £ as n — > oo. Since £, £™ G conv(r), we know that /r(£) and limsup/r(£") are upper 

n— >oc 

bounded by (5(T) hence finite. Moreover, there is an I* G 2T(r) such that (xi)iej* is an affinc 
basis and such that 

mo = E A « lis - and E A *^ = & E A > = x > A * ^ °< ze r - 

Up to an extraction, still denoted (/r(£n))n>i, one may assume that in fact /r(£n) - > limsup n fr(£,n) 
and that there exists an index subset Iq C {1, . . . , n} such that, for every n > 1, £ n £ conv(r/ ) 
where Tj := {xi, i£ I}. The convex hull being closed, £g conv(Tj ). Hence there exists (A?)j 6 j 
such that 

e = E A i° a? i. E A ? = 1 < \°>o,tej . 

Now let £' G conv(r /o ) i.e. writing £' = Y, ie i A * x «' S ie / A- = 1, A- > 0, i G 7 - Let i' = 
axgmin|jj, A? > o|. Then 

e = E A ^+^(s- E A °^)) 
- E W-^ )*-^ 
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where £ i6Jo>i ^, (K ~ ^i) + ^ = (1 - K'J ^ A° ) + ^ = 1. Consequently ? G 

c(mv(Tj a \{i> a } U{£}). Now, Jo being finite, it follows that, up to a new extraction, one may assume 
that 

£„ 6 conv(r JtA{io} U {£}) for an i G I . 
Case 1. If £ ^ aff(IV \ u \), then r/ \{-j } U {£} is affinely free and then writes uniquely 

= m™c + E /c* 

«6-fo\{«o} 

as a (convex) linear combination. Since — > £, one has owing to compactness and uniqueness 
arguments that /U™ — > iE Iq \ {io} and \i n — > 1 as n — >• oo. One derives that 

= E $ x i + E ^ X *j x ^ 
iei \{i } jei* 

so that 

hu < E n*< - u\ p + E ^iii^ - ^n p 

ie/ \{io} ie/* 

which implies in turn 

lim/r(£„) < V 0+lx/rK). 

n * — ■» 

ie-fo\{io} 

Case 2. If £ G aff (rj \{j }) then £ G conv(r/ \{ io }) by uniqueness of barycentric coordinates in 
the afhne basis r/ \{j }- Then £„, £ G conv(T \ {io}) and we can repeat the above procedure to 
reduce again io \ {*o} into Jo \ {io, h} until T \ {io, ii, ■ ■ ■ , i P } becomes affinely free. If so the 
same reasoning as above completes the proof. If it never occurs, this means that = £ for every 
ri > 1 which trivially solves the problem. □ 

We can now state the main result about the optimality regions Di(T). 

Proposition 7. (a) For every I G I(T), {xj : j G 1} C Di(T) C convjxj : j <E I}, Di(T) is 
closed and therefore a Borel set. 

(b) The family (Dj(r)J IeI / r \ makes up a Borel measurable covering o/conv(T). 

Proof, (a) The first inclusion is obvious (set £ = Xj, Xj ■ = 1) and the second one follows directly 
from the definition of Di(T). To recognize that Di(T) is closed, note that, owing to Theorem 1, 
the mappings £ H> Y^jei ~ x o\\ v an< ^ £ ^ T) are continuous. 

(b) Since (|LP[) has a solution for every £ G conv(r), we derive from Proposition|5]that Dj(T) - 

zez(r) 

conv(r). □ 



2.3 Intrinsic stationarity 

To establish the link between the above definition of dual quantization and stationary quantization 
rules, we have to precise the notion of intrinsic stationarity. 

Definition 3. (a) Let V C M d be a finite subset ofM. d and let (flo, So, Po) be a probability space. 
Any random operator J? : (Oo x D,So ® Bor(D)) —> T, conv(r) C D C M. d is called a splitting 
operator (onto T). 
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A splitting operator on T satisfying 

V£ e conv(r), Ep (Jr(.,0) = / Jr{uo,t)V (duo) = £ 
is called an intrinsic stationary splitting operator. 

We will see in the next paragraph that (fio,«5o,Po) can be modelled as an exogenous probability 
space in order to randomly "split" (e.g. by simulation) a r.v. X , defined on the probability space 
of interest (f2,<S, P), between the points in V. 

This new stationarity property is in fact equivalent to the dual stationarity property © on the 
product space (f2o x tt,So <3 S,¥q ® P) as emphasized by the following easy propositon. 

Proposition 8. Let conv(r) C D C M. d . A random splitting operator Jy : (f2o x D,Sq ® 
03(D)) —> r is intrinsic stationary, if and only if, for any r.v. Y : (f2, <S, P) — > (M. d ,B d ) satisfying 
supp(Py) C conv(r), 

Ep o0jP (j r (F)|r) =Y P ®P-o.«. (12) 

where J7r o,nd Y are canonically extended onto f2o x f2 by setting Jt({uio,oS), .) = Jt(ijJq, .) and 
Y(uo,u) = Y(w). 

Proof. The direct implication follows directly from Fubini's theorem and Definition [3] For the 
reverse one simply set Y = £. □ 

2.3.1 Dual quantization operator J7p and its interpolation counterpart JJ r 

A way to define such an intrinsic stationary random splitting operator in an optimal manner is 
provided by the dual quantization operator ■ 

Therefore, let T = {x%, . . . ,x n } C M. d , k G N and assume that aff. dim(r) = d. Otherwise the 
dual quantization operator is not defined. 

We then may choose a Borel partition {Ci(T)) IeX (r) of conv(r) such that, for every / G I(r), 
Cj(T) c Djp) = {C G R d : A,, := Aj 1 [f ] > and £ A*||£ - Xj \f = **&r)} 



with the notations of ([TT]) . As a consequence, up to a reordering of rows, the Borel function 



A'(0 = 







(13) 



gives an optimal solution to F p (£; T) for every ( 6(7;. 

Now we are in position to define the dual quantization operator. 

Definition 4 (Dual quantization operator). Let (Slo, So, Po) = ([0, 1], Q3([0, 1]), A 1 ) and let U = 
Id[ 0j i] be the canonical random variable with U([0, 1]) distribution over the unit interval. The 
dual quantization operator J£ : SIq x conv(r) — > V is then defined for every (uiq,£)e x R d by 



EXi-l r i-i i . (w ) 



l C/( r)(0- (14) 



The dual quantization operator is clearly an intrinsic stationary splitting operator. First 

VJeX(r), Vie/, e Po (i ^ , 1 ) =A *^)- 
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On the other hand 

n 



so that Jy shares the intrinsic stationarity property: 

veeconv(r), e Po (j*(0)= Yl 

iei(r) 



i=l 



lc,(r)(0=£- 



Remark. The Q5([0, 1]) ® *B(conv(r))-mcasurability of the dual quantization operator is an 
easy consequence of the facts that Ci(T) are Borcl sets and £ M> A 7 (£) as defined by (Q2|) is a 
continuous, hence Borel, function. 

On the other hand, one easily checks that this construction also yields 

n 

Veeconv(r), E P J$-J*(0r = J2*l(0\\xi-Z\\ P = F P &r)- (15) 

i=l 

Definition 5 (Companion interpolation operator). The companion interpolation operator * s 
defined from J r (conv(r), R) = {/ : conv(r) — > R} into itself by 



;(F)=E Pn (F(j r *( Wo ,.))) = 



/ez(r) 



lc,(r) (16) 



This operator JJf maps continuous functions into piecewise linear continuous functions and one 
clearly has 

T T (F)(X)=E(F(J*(X))\X) 



so 



that E(J* (F)(X)) =E(F{J*{X))). 



Change of notation. From now on, we switch to the product space (fi x O,5 ® ^jPo ®P). 
(However, if no ambiguity, we will still use the symbols P and E to denote the probability and 
the expectation on this product space.) Doing so, we may assume that the intrinsic stationary 
splitting operator is independent of any "endogenous" r.v. defined on (f2, <S,P), canonically 
extended to (Oo x f2, <So ® S, Po 8> P) (which implies that the stationary property (|T2l holds). 

2.3.2 Characterizations of the optimal dual quantization error 

We use this operator to prove the analogous theorem for dual quantization to Proposition [TJ 
Theorem 2. Let X : (fi,«S,P) -> R d be a r.v., let pe [1, oo) and let nsN. T/ien 

d n , P (^) = inf{E||X - Jt(X)\\ p : J r : Q x R d -> I\ T C R rf , intrinsic stationary, 

supp(Px) C conv(r), |r| < n} 
= inf {E||X - ? ||p : Y : {Q x fi, S <g> 5, P <g> P) ->■ R d , 

|y(fi x 0)| < n, E mPo (y|X) =IP0 Po-a.s.} < +oo. 

These quantities are finite iff X £ L°°(f2, iS, P) and n > d + 1. 
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Proof. First we show the inequality 

d p n {X) > inf{E||X - Jv{X)f : J T ■ M. d -> F is intrinsic stationary, 

supp(Px) C conv(r), T C R d , |r| < n). 

We may assume that df^X) < +00 which implies the existence of a grid T £ M. d with F| < n 
and d p (X;T) < +00 so that Proposition [3] implies supp(Px) C conv(r). 

Hence, we choose a Borel partition (C/(r))/ e i(r) of conv(r) with C/(r) C Di(T), / € I(T), so 
that the dual quantization operator J£ is well defined by (fT4|) on conv(r). Let us still denote 
J7p its Borel extension by outside conv(r). 
Owing to the independence of X and on Slo x it holds 

E(||£ - # (011%==* = E(||X - Jr*(X)f I X) a.*., 

so that we conclude from (fl5l) 



EF p (I ; r) =E[E( J F p (X;r) |x)] = E[E(F p (^ ; r)) k=x ] 

= e[e(hc - #(ou% =x ] = E [ E (n x - J?(*)H P I *)] 

= E||X- Jr{X)\\ p . 

Since is intrinsic stationary by construction, the first inequality (JTTJ) holds. 
The second inequality 

inf{E||X - Jr(X)\\' p : J T is intrinsic stationary, supp(P x ) C conv(r), |r| < 71} 
> inf{E||X - Yf : Y is a r.v., \Y(n x fi)| < n, E(Y|X) = X} 

follows directly from setting Y = J^{X) in the case exists and supp(Px) C conv(r). Other- 
wise, there is nothing to show. 

To prove the reverse inequality, let us consider a r.v. Y on fio x O s.t. |Y(f2o x fl)| < n and 

E(Y\X)=X a.s. 

Such r.v. do exist owing to what precedes. Let Y(f2o x O) = {yi, . . . , y^} with k < n and let 

K = U H- P ® P(? = y, I X = 0) o X, 1 < i < fc, 

where the above mapping denotes a regular versions of the conditional expectation on R d (so 
that Xi is 5o (8> 5- measurable) , i = 1, . . . , k. 
Hence, there exists a null set N 6 6>o <S> S such that 



A i (w)=E(y|A')(a>)=A:(w) 

E M^) = 1 

i=l 



= (wo,o;) e N c , < 

K \i{u>) G [0,1], 1 < i < fc. 

Setting T = {yi, . . . , y^}, we get for every ui S iV c 

fc fc 
E(|]X - yf I X)(u>) - £ Ai(a>)E(||JC - y^f | JT)(w) = £ A,(a>) ||X(a;) - y 4 f 
?:=i i=i 

>^(xH ; r). 

Taking the expectation completes the proof. □ 
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Remark. We necessarily need to define Y on the larger product probability space (f2o x 0, <5>o ® 
S, Po <8>P) rather than only on (f2, S, P), since S might not be fine enough to contain appropriated 
r.v.s Y satisfying E(Y|X) = X. E.g., if S = cr(X), Y would be er(A)-mcasurable so that 
E(y|X) = Y, intrinsic stationarity would become unreachable for general finite-valued r.v. Y. 

2.3.3 Applications of intrinsic stationarity to cubature formulas 

As a consequence of the above Theorem [5] we get the following theorem about cubature by dual 
quantization. 

First, one must keep in mind as concerns functional approximation interpretation and numer- 
ical integration that E(j7p(X)) = E(jp(F)(X)) and that the second expression based on the 
interpolation formula (|16|) may be more intuitive although, once the weights 

Pi = P(Jr*(X) = Xi ), i = l,...,n, 

have been computed "off line" the cubature formula is of course more efficient in its aggregated 
form corresponding to E(jp*(X)). It is straightforward that if F : M. d — > R is a-H61der continuous 
on conv(r), then (with obvious notations), if conv(r) D supp(Pjf), 

\EF(X) -E(jf(F)(A))| = \EF(X)-EF(J£(X))\ < [F] Lip E\\X - J^(X)\\. 

One may go further like with Voronoi quantization when F is smoother, taking advantage of the 
stationarity property (satisfied here by any grid). 

Proposition 9. Let X : (0,5) -> R d be a r.v. with a compactly supported distribution Fx- Let 
r = {xi, . . . , i„} C R d be a grid with conv(r) D supp(Px). Then for every function F : R d — > R, 
differentiate in the neighbourhood o/conv(r), with Lipschitz continuous partial derivatives on 
conv(r), it holds for the cubature formula E F(J^(X)) = ^2™ =1 Pi ■ F[xi) 

\&F(X)-E{r T {F)(X)) \ = \EF(X)-EF(J^(X))\ < [F'] Lip E\\X - J^(X)\\ 2 . 

Proof. The result follows straightforwardly from taking the expectation in the Taylor expansion 
of F at X at the second order, namely 

\F(J r *(X)) F(X) F'(X).(J^(X) -X)\< [F'] Up \\X J£{X)\\\ 

and applying the stationarity property E(Jp* (X) — X \ X) —Q. □ 

Now assume that the integrand F is a convex function. If X T is a Voronoi quantization which 
satisfies the regular stationarity property E(X\X T ) = X r , it follows from Jensen's inequality 
that EF(X r ) yields a lower bound for the approximation of ~EF(X). 

By contrast to that and exploiting the intrinsic stationarity of J7p* , a cubature formula based on 
Jy yields for convex functions F an upper bound, which is now valid for any grid T C R d . 

Proposition 10. Let X and T be like in Proposition^ Assume that F : conv(r) — > R is convex. 
Then J r (F) defines a convex function on conv(r) satisfying Sp(F) > F. In particular 

E(jf(F)(X)) > EF(X). 

Proof. The inequality J r (F) > F follows from the very definition (fTB]) of Jp. Its convexity is a 
consequence of its affinity on each <i-simplex Ci(T), and its coincidence with F on T. □ 

Application to convex order. Dual quantization preserves the convex order on conv(r): if 
X and Y arc two r.v. a.s. taking values in conv(r) such that X < c Y - i.e. for every convex 
function <p : conv(r) -> R, Eip(X) < Eip(Y) - then J£(X) < c Jf(v)- 
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2.4 Upper bounds and product quantization 

Proposition 11 (Scalar bound). Let V = {xi, . . . ,x n } cl with Xi < . . . < x n . Then 
V£ G [x u x n ], F*(£,T) < max ( • T ' +1 ~ X ' ) P . 

l<i<n— 1 V Z / 

Proof. If £ £ r, then T) = and the assertion holds. Suppose now £ G (2^,2^+1). Then 

£ = Xxi + (1 — \)xi + i and A = *'+i-x- > 80 * na * 



r) < ( Xm m ie - + ( * a< ) ie - xi+i r 

attains its maximum at £ = £i+£i±l p This implies 

f*(e,r)< 



a- 










2 



which yields the assertion. □ 

Proposition 12 (Local product Quantization). Let \\-\\ = \ ■ \ p be the canonical p-norm on R d , 
£ = (£i, • • • , £d) ^ ^ ari ^ r = rij=i a j f or some finite subsets otj C K. T/ien 

(/ 

f*(£ ; r) = 

3=1 

Proof. Denoting ay = {aj, . . . , a^. }, T = {xi, . .. ,x n } and due to the fact that {xi, . . . , x n } is 
made up by the cartesian product of {a{, . . . , a? n . }, j = 1, . . . , d we have for any u, £ S 

™d5> •'-4 ^, mil :. - a i\ p +Mto -«?')}• 

j j=i 

We then get from Proposition [5] 

F^;r) = max min r.' "• ./-,'.} 

j=i 

d 

= maxY^ min {!£,• - a^| p + tt ,•(£,• - a|)) 

3 = 1 ~ ~ 

J=l " " ' 3 = 1 

which completes the proof. □ 

This enables us to derive a first upper bound for the asymptotics of the optimal dual quantization 
error of distributions with bounded support when the size of the grid tends to infinity. 

Proposition 13 (Product Quantization). Let C = a + £[0,l] d , a = (ai, . . . ,a,d) e K d , £ > 0, 
be a hypercube, parallel to the coordinate axis with common edge length I. Let T be the product 

d r it i 
quantizer of size (m + l) d defined by T = JJ i a, H , i = 0, . . . ,m >. 

3=i m 
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Then it holds 

V£gC, F v{^T)<d-C pM -(l) P -m-v (18) 
where C Pi j|.|| = sup^^^ ||a;|| p > 0. Moreover, for any compactly supported r.v. X 

d n , p (X) = o^- 1 ' 11 ). 

Proof. The first claim follows directly from Propositions [TT] and H21 For the second assertion let 
n > 2 d and set m = [n 1 ^] — 1. If we choose the hypercube C such that supp(Px) C C we arrive 
owing to (|18[) at 

/ 1 \p /l\p/ d 

for some constants C±, Ca > 0, which yields the desired upper bound. □ 

2.5 Extension for distributions with unbounded support 

We have seen in the previous sections, that F P (^;T) is finite if and only £ 6 conv(r), so that 
intrinsic stationarity cannot hold for a r.v. X with unbounded support. 

Nevertheless, we may restrict the stationarity requirement in the definition of the dual quantiza- 
tion error for unbounded X to its "natural domain" conv(r), which means that from now on we 
will drop the constraint supp(P^) C conv(r) in Theorem [2] 

Definition 6. The random splitting operator is caninically extended to the whole M. d by setting 

Vw £ fio, V£ i conv(r), J?(wo,0 = MO 

where Ttr denotes a Borel nearest neighbour projection on Y . Subsequently we define the extended 
LP -mean dual quantization error as 

o? n (X) = inf{E||X - Jr(X)\\ p : Jr ■ x R d -> T is intrinsic stationary^ C M. d , \T\ < n). 

Remark. When dealing with Euclidean norm, a (continuous,) alternative is to set i7*(wo,£) = 
^*(wo, Proj conv / r \(£)) but, although looking more natural from a geometrical point of view, it 
provides no numerical improvement for applications and induces additional technicalities (espe- 
cially for the existence of optimal quantizers and the counterpart of Zador's theorem). 

Combining Proposition!]] and Theorem[2]and keeping in mind that outside conv(r), — J7r(£)ll — 
dist(£,r), we get the following proposition. 

Proposition 14. Let X G L p Rd {f). Then dP n {X) = inf {EpP(X; T) : T C K d , |T| < n) where 

FP(e;r)=FP(e;r)l conv( r)(0 + IIC-^r(OII P lconv(r )c (0 
Note that, owing to Proposition [3l we have for any X G L P , d (P) 

dl(X) < d?(X), 

where equality does not hold in general even for compactly supported r.v. X although it is shown 
in the companion paper |12j that both quantities coincide asymptotically in the bounded case. 
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2.6 Rate of convergence : Zador's Theorem for dual quantization 

In the companion paper |12j . we establish the following theorem which looks formally identical 
to the celebrated Zador Theorem for regular vector quantization. 

Theorem 3. (a) Let X G L^*(P), i5 > 0, absolutely continuous w.r.t. to the Lebesgue measure 
on (R d ,<B(]R d )) and P x = h.X d . Then 

^y /d ^AX) = Q d , pM -\\h\\]% +p) 

where 

Q d>pM = lim n^ d d n JU([0,l] d )) = inf n^ d d n JU([0, l] d )). 

" 11 n— >oo x v 77 n>X 

This constant satisfies Q d ,p,\\-\\ > Q v d q p u.u, where Q v d q p |.| denotes the asymptotic constant for the 
sharp Voronoi vector quantization rate of the uniform distribution over [0, l] d , i.e. 

Q v d \ M = lim n 1 f d e n , p (U([0,l] d )) = inf n l ' d e n , p {U([0, l] d )). 

"-^I II n— >oo v x 77 n>l 

Furthermore, when d — 1 we know that Qd. P .\\-\\ = ( 2 p + 2 Q V <ip \\-\y 

(b) When X has a compact support the above sharp rate holds for d n ^ p (X) as well. 

We also establish the following non-asymptotic upper-bound (at the exact rate). 

Proposition 15 (d-dimensional extended Pierce Lemma). Let p, rj > 0. There exists an integer 
n-d p r/ > 1 and a real constant C d P n such that, for every n > p n and every random variable 
XeL^(n ,A,P), 

d n , P (X) < Cd, P , n o- p+rj ^y\\ (X) n x l d 
where cr p+ri ^,\\(X) = inf aeRd \\X - a\\ LP + n . 

//supp(Px) is compact then the same inequality holds true for d n ^ p {X). 



3 Quadratic Euclidean case and Delaunay Triangulation 

In the case that (R d , ||-||) is the Euclidean space and p = 2, the optimality regions Dj(T) have 
either empty interior or are maximal, i.e. -D/(r) = or Di(T) = conv{xj : j G /}. This follows 
from the fact that in the quadratic Euclidean case the dual feasibility of a basis (index set) 
/ G T(Y) with respect to a given £ is locally constant outside the median hyperplanes defined by 
pairs of points of V. 

This feature is also the key to the following theorem, which was first proved by Raj an in [15] and 
establishes the link between a solution to F 2 (£; T) (the so-called power function in [15]) and the 
Delaunay property of a triangle. 

Recall that a triangle (or d-simplex) convja;^ , . . . , xi d+1 } spanned by a set of points belonging to 
r = {an, . . . ,Xk},k > d + 1 has the Delaunay property, if the sphere spanned by {xi t , . . . , Xi d+1 } 
contains no point of T in its interior. 

Theorem 4. Let ||-|| = |-|2 be the Euclidean norm, p = 2, and T = {x\, . . . , Xk] C M. d with 
aff . dim{r} = d. 

(a) If I € T(T) defines a Delaunay triangle (or d-simplex), then 

provides a solution to \LP\ for every £ G convjxj : j G /}. 
In particular, this implies Di(T) = convjxj : j El}. 

(b) If I G T(T) satisfies Dj(r) ^ 0, then the triangle (or d-simplex) defined by I has the Delaunay 
property for T. 
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We provide here a short proof based on the duality for Linear Programming (see Theorem p. 93 
and the remarks that follow in 0]), only for the reader's convenience. 

Proof. First note that / £ Z(r) defines a Delaunay triangle (or d-simplex) if there is exists a 
center z £ M. d such that for every j £ I 

\z — Xj\2 < \z — Xi\2, 1 < i < k, (19) 

and equality holds for i £ I. Suppose that z = £ + Then 

Mi £ I, \z- Xi\l = \€-Xi\l + £ T ui - xjui + 
so that (fT9"|) is equivalent to 

IZ-Xjll-xfin^lZ-Xill-xJm, l<i<k,j£l, 

. l2 t ( 20 ) 

U2 = \£-Xj\ 2 -X j U 1 , j£l. 

Note that this is exactly the dual feasibility condition of Proposition [51 

(a) Now let / £ I(T) such that {xj : j £ 1} defines a Delaunay triangle. We denote by z £ M. d 
the center of the sphere spanned by {xj ; j £ I}; let j £ I be a fixed (arbitrary) index in what 
follows. For every £ £ M. d , we define u = = (1*1,142) as 

u\ = 2{z — £) and U2 = |£ — Xj\% — Xj Ui. 

Consequently z = £ + so that u is dual feasible for (|LP[) owing to what precedes. 

Since A/ = Aj 1 [|] > iff £ G convjsj : j £ I}, Proposition [(^a) then yields that A/ provides 

an optimal solution to (|LP[) for any £ G convjxj : j g /}. 

* > 

(6) Let J G 2T(r) and choose some £ G Z)j(r). Then Proposition[7][a) implies £ G convjxj : j £ I}. 
As a consequence, it holds Aj 1 [ |] = A/ > 0, so that we conclude from Proposition[SIh) that the 
unique dual solution to (|LP|) is given by („* ) = (vlf ) _1 cj. Since moreover A T (ZD < c, (iti,U2) 
satisfies (f2U]) so that 

is the center of a Delaunay triangle containing £ in its interior. □ 

Consequently, if a grid T c M. d exhibits a Delaunay triangulation, the dual quantization operator 
Jy is (up to the triangles borders) uniquely defined and maps any £ G conv(r) to the vertices of 
the Delaunay triangle in which £ lies. 

This yields a duality relation between and the nearest neighbor projection 7rr since the Voronoi 
tessellation is the dual counterpart of the Delaunay triangulation in the graph theoretic sense. 
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4 Existence of an optimal dual quantization grid 

In order to derive the existence of the optimal dual quantization grids, i.e. the fact that the 
infimum over all grids V C M d with |T| < n in Definition [2] holds actually as a minimum, we 
have to discuss properties of F p and d p as mapping of the quantization grid T. This leads us to 
introduce "functional version" of F p (£, T) and d p (X,T). 

We therefore define for every n > 1 and every n-tuple 7 = (x\, . . . , x n ) £ (M. d ) n 

F„ iP (£, 7 )=inf{( Y, \M - Xi\\ p y /P : \i £ [0,1] and £ \ iXi = f, £ A, = 1} 

1 < i < n KKfi 1 < i < n 
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and 

d„, p (X, 7 ) = ||F„ >p (X )7 )|| L p. 

These functions are clearly symmetric and in fact only depend on the value set of 7 = (x\ , . . . ,x n ), 
denoted r = T 7 = {xi, i = 1, . . . ,n} (with size at most n). Hence, we have 

F n ,p{£,,l) = FpfcTy) and d n , P {X, 7) = d p (X; T y ), 

which implies 

d n , p (X) = inf{d„ )P (X )7 ) : 7 G (R d ) n }. 

One also carries over these definitions to the unbounded case, i.e. we obtain ^^(^,7) and 
d ntP (X,y). 

As in section [2j we may drop a duplicate parameter p in the p-th power of the above expression, 
e.g. we write F%(£, 7) instead of (£, 7). Moreover, we assume again without loss of generality 
that conv(suppPx) has a nonempty interior in M. d or equivalently that 

span(suppP x ) = M. d . 

4.1 Distributions with compact support 

We first handle the case when supp(Pjf ) is compact. 

Theorem 5. (a) Let p G [1, +00). For every integer n > d + 1, the LP -mean dual quantization 
error function 7 >— > d„. p (X, 7) is l.s.c. and if p > 1 it also attains a minimum. 

(b) Let p > 1 and let n > d+ 1. // |supp(Px)| > n,, any optimal grid T n '* has size n and 
d n ,p{X) = if and only if |supp(Px)| < n. Furthermore, the sequence n i-> d n>p {X) decreases 
(strictly) to as long as it does not vanish. 

Remark. In Theorem [T^a) the continuity of d ntP (X, .) is established when P A , assigns no mass 
to hyperplanes (strong continuity). 

Proof, (a) Lower semi- continuity. Let 7*- fc ) = (x^ , . . . ,Xn), k > 1 be a sequence of n-tuples 
that converges towards Keeping in mind that the dual representation (see Proposition [5]) 

of FP. 

F»(£,(x 1 ,...,x n ))= sup mm {U-Xi\\ p + u T {C-Xi)} 

ueK d l<i<n 

implies that F p (^, .) is l.s.c. , we get 

liminf^(C,7 (fe) )>^(C,7 (oo) )- 

k—¥oo 

Consequently, one derives that d n . p (X, • ) is l.s.c. since 

limmfdP(X,7«) >E(lim fc infFP(X, 7 W)) > e(f»(X, 7 (oo) )) = d p n (X,^) 
owing to Fatou's lemma. 

Existence of an optimal dual quantization grid. Assume that 7'", k > 1, is a general sequence 
of n-tuples such that liminffc d n , p (X, 'y^) < +00 which exists owing to Proposition [3^6)). Then 

liminffc mini<i<„ \x[ k ^\ < +00 since, otherwise one has 

liminf <(X, 7 (fe) ) > Edist(X, 7 (fe) ) p > E liminf dist(A, 7 (fc) ) p = +00 

k— >oo k— >oo 

owing to Fatou's lemma. 
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Now, up to appropriate extractions, one may assume that d n>p (X, r f' k ') converges to a finite limit 
and that there exists a nonempty set of indices Jx, C {1, . . . , n} such that 

V j e Joo, xf ] -+ xf°\ Vj i Joo, ||xf 5 || -> +00 as fc ->• 00. 

Let e e supp(Px), 7^°°- ) be any n-tuple of (R d ) n such that r 7 (oo> = {Xj , j G Joo} and denote 
"00 = I Joo I ■ We then want to show 

liminfFP(e,7 W )>^(e,7 (oo) )- (21) 

k— >oc 

Moreover, let u G R d and (y k )k>i be a sequence such that \\yk\\ — > +00. Then it holds for p > 1 

||e-?/feir + u T (e-2/fc) ^+00 asfc->oo. (22) 

In the case when u T (£ — Uk) is bounded from below, the above claim (|22D is trivial. Otherwise, 
we have m t (£ — Uk) — t — 00 so that for fc large enough it holds 

lie - yk\\ p + « T (e - = lie - - \u T {t y k )\. 

Applying Cauchy-Schwarz and using the equivalence of norms on M. d we arrive at 

lie - y k \\ p + u T (a - yk ) > ne - y k \\ p - H 2 ie - ykU > lie - Mil (lie - ^ir 1 - c H \ u \ 2 ) -> +co. 

This yields for any u G M d 

liminf min {||e - xf^\\ p + " T (e — x f^)} > min {||£ - xH \\ p + u T (£ - x^)}, 
so that the dual representation of F% finally implies (|21l) . 

Now, assume that the sequence ("f^) k >i is asymptotically optimal in the sense that d n ^ p (X) = 
linifc d n>p {X, 7W) < +00. Fatou's lemma and (|2~lj) imply 

rf„, p (X) = limd„ )P (X,7«) > d nooiP (X,r 7(oo) ) > d noo , p (A) > d„, p (X) 

so that 

d n ,p(X) = d rix . p (X : r" 7 (oc)) = d naotP (X). 
This proves the existence of an optimal dual quantizer at level n. 

(b) To prove that the J p -mean dual quantization error decreases with optimal grids of full size n 
at level n, as long as it does not vanish, we will proceed by induction. 

Case n = d+1. Then J^ = and furthermore r 7 (oo) has size d+ 1 since its convex hull contains 
supp(Px) which has a nonempty interior. Owing to the lower semi-continuity of the function 
d n ,p(X, •),7 < - 00 * 1 is optimal. Furthermore, if supp(Px) = r„ := {xi, . . . ,x no } has size uq < d+1, 
then setting successively for every io G {1, . . . £ = x.; , Xj = 5i j (Kronecker symbol) yields 
F n0)P (£;r„ ) = for every £g T, which implies d n0tP (X) = d no!P (X;T no ) = 0. 

Case n > d+1. Assume now that |supp(Px)| > n. Then there exists by the induction 
assumption an optimal grid = {x 1; . . . , x*_ x } C M. d at level n — 1 which is optimal for 

d n -i,p(X, ■ ) and contains exactly n— 1 points. By Proposition^a), this grid contains d+1 affincly 
independent points since d n -i tP (X) < +00 (and span(supp(Px)) = K d ) i.e. aff. dimT* = d. Let 
£0 £ supp(Px) \ r*_ x and let r n _i(£o) = {x*, iG Io} be some affinely independent points from 
T*_ l7 solution to the optimization problem (|LP|) at level n — 1 for F„_i !P (£o; ^n-i)- By the 
incomplete (afiine) basis theorem, there exists I C {1, . . . ,n — 1} such that 

I D Iq, \I\ = d + 1, {x*, i G 1} is an afiine basis of R d . 
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By the (affme) exchange lemma, for every index j 6 Iq, {x*, i€ I, i ^ j}U {Co} is an affine basis 
ofK d . Furthermore |J (_B(£ ;£)n conv^ja;*, i£ I, i ^ j} U {Co}JJ is a neighbourhood of £o in 

conv(r*_ 1 ) since Co£ supp(Pjf) C conv(r*_ 1 ). Consequently there exists iqE Iq such that 
P(xe B(fae) n convex*, iE I, i ± i } U {Co})) > 0. 

Now for every v E B(0;1) (w.r.t. ||.||), v writes on the vector basis {x* — Co}ieA{*o}' v = 

T,iei\{i } Si ( x i ~ f°) with coordinates 6»; satisfying X«e/\{ l0 } N - ^.H-H.x, where C d ,\\.\\ tX € 

[1, +oo) only depends on d, the norm |.| and X (through the grid T*). 

Let eE (0, {Cd.\\.\\.x + I) -1 ) be a positive real number to be specified later on. 

Let (E B H (£o;s) nconv({x|, i£l,i^ {Co})- Then v = ^6 #|j.||(0;l) and 

C=(l-e E E i*i- 

iel\{i } i€l\{i } 

V v ' 

>0 

Furthermore, by the uniqueness of the decomposition (with sum equal to 1), we also know that 
0i > 0, i E I \ {io}- Consequently 

^(c,r;.iu{&})<(i-£ £ ^)nc-Coii p +£ E ^iic-^r 

iSI\{io} ie/\{io} 

Now set L* := max^g/ ||£q — £*||- Then 

HC-6ll<e E ^IK-£o||<£C d) | M | )X L* 
«eA{»o} 

and, for every i E I \ {io}, 

IIC - <ll < IK- Coll + L* < (eC dtHtX + 1)L* < 2L*. 
Finally, for every e£ (0, Cd ^ hx +i ) and every £e (Co; e), 

fp(c,K-i u {Co}) < eZ; with Z; = c < ,,| M | iX (i*)"(i + n 

On the other hand, if e < dist(^o, 

K-i^K-i)>^t{C,K-iy > (dist(e ,r;_ 1 )- e ) p 

so that, for small enough e, eZ* < F T j_ 1 (C, T* _ 1 ) which finally proves the existence of an e > 
such that 

vce i?| M |(Co;£) nconv(K, %e i, i ± i } u {Co}), ^(CCi u {Co}) < d(C,r*n-i)- 

As a first result, 

d n , p {X) < d p {X-T* n -i U {Co}) < dppfjrJU) = d„-i lP P0- 

Furthermore, this shows that is empty i.e. all the components of the subsequence (7^ 
remain bounded and converge towards 7' 00 ). Hence 7'°°) has n pairwise distinct components 
since d n ^ p {X\^°°^) = d n ^ p {X) < d n -i tP (X) owing to the l.s.c. 

Finally, the convergence to follows from Proposition [T3] □ 
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Further comments: When conv(supp(Px)) is spanned by finitely many (extremal) points of 
supp(Px), i.e. there exists T ext C supp(Px), \T e xt\ < +00 such that 

conv(supp(P. Y )) = conv(T ext ), T ext C supp(P^), 

(we may assume w.l.o.g. that \T ext \ > d+1). In such a geometric configuration, it is natural to 
define a variant of the optimal L p -mean dual quantization by only considering, for n > \T ext \, 
grids r containing T ext and contained in conv(suppPx) leading to 

d e n x j(X,T) = inf {\\F p {X,r)\\ LP , T ext cTc conv(supp(P x )), |r| < n}. (23) 

For this error modulus the existence of an optimal quantizer directly follows form the l.s.c. of 
7 h->- d^ p (X, 7) (with the usual convention). When these two notions of dual quantization co- 
exist {e.g. for parallclipipedic sets), it does not mean that they coincide, even in the quadratic 
Euclidean case. 

4.2 Distributions with unbounded support 

Let Xe L'P(F) and let r > 1. We define 

= -F P (£;r)i{xeconv(r)} + dist(£,r)i{^ conv ( r) } 

and 

d p (X;T) = \\F p (X:T)\\ LP < +00, 
since d p {X;T) < diam(r) + ||dist(X, T)\\ LP . 

Theorem 6. Let p > 1. Assume that the distribution Fx is strongly continuous, namely 

ViJ hyperplane ofR d , F(X <E H) = 0, 

and has a support with a nonempty interior. Then the extended L F '-mean dual quantization error 
function 7 t-> d n ^ p (X, 7) is l.s.c. Furthermore, it attains a minimum and d n ^ p (X) is decreasing 
down to 0. 

First we need a lemma which shows that under the strong continuity assumption made on Fx, 
optimal (or nearly optimal), grids cannot lie in an affine hyperplane. 

Lemma 1. Let p > 1. LfFx is strongly continuous, then 

£d-i, p {X) := inf I ||dist(X, H)\\lp, H hyperplane^ > 0. 

Proof. Let k := inf i u i 2= i ||u|| > where |-|2 denotes the canonical Euclidean norm. Let (.|.) 
denote the canonical inner product. Let H = b + u^, be R d , u€ R d , \u\ 2 — 1 be a hyperplane. 
If ae H, 

\\X - a|| > k\X - a| 2 >k\(X- a\u)\ = k\(X - b\u)\ 

so that, dist(X, H) > k\(X — b,u)\. Now, if ed—ip(X) = 0, then there exists two sequences 
(w n ) n >i and (& n )n>i such that \u n \2 = 1 and e n := k\\{X — 5„|u„)||lp — > 0. In particular 
|(&n|Wn)| < 2 ||X|| L p + £„. Up to an extraction one may assume that (with |Woo|2 = 1) 

and (b n \u n ) —> £s R. Then, by continuity of the L p -norm, (X\uoo) = £ F-a.s. which contradicts 
the strong continuity assumption since {16 M. d : (a;|uoo) = £} is a hyperplane. □ 
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Proof of Theorem The proof closely follows the lines of the compactly supported case. Let 
y( k >, k > 1, be a sequence of n-tuples such that liminffc d n , P {X, 7W) < +00. Let Joo be defined 
like in the proof of Theorem [5] (after the appropriate extractions). Set T^^) = {Xj\ j G Joo} 
and accordingly. 

Let £ G K d and let fc' be a subsequence (depending on £) such that lirninf & F n ,p((,, 7^')) = 
limfcF n>p (^,7( fe )). We will inspect three cases: 

- If £ G limsup fc conv(7^ )), then there exists a subsequence fc" such that £ G conv{7^ fc ' ^} and 
following the lines of the proof of Theorem[SJ&), one proves that either +00 = liim. F n ,p(^,^ k ') = 
]im k F niP (£,j( k ")) > FPfajW) or £g conv^ 00 '} and 

FnA^l iQC) ) = ^, P (£,7 (oo) ) < lirninf ^^(£,7^")) = ]imF n>p (£, 7^")) = lirninf F„, p (£, 7W). 

- If £ ^ limsup fe conv(7^ fe )) and £ ^ <9conv{7(°°))}, then, for large enough k, 

F n A^l {k) ) = dist(£, 7 «) -> di8t(e,r 7 (c.j) = ^„, P (£,7 (oo) )- 

- Otherwise, £ belongs to dconv{^°°^}. At such points F n , P {(,i •) is n °t l.s.c. at 7c 00 ) but the 
boundary of the convex hull of finitely many points is made up with affine hyperplanes so that 
this boundary is negligible for Fx- 

Finally this proves that 

P x (dO-a.s. lirninf F„, p (£,7 (fc) ) > F„, P (£, 7 (oo) )- 

One concludes using Fatou's Lemma like in the compact case that, on the one hand d ntP (X, ■ ) 
is l.s.c. by considering a sequence 7^ converging to j(°°> and on the other hand that there 
exists an _L p -optimal grid for d n _ p (X, ■ ), namely by considering an asymptotically optimal 
sequence (7^)fc>i since 

d n>p {X) = lim4,p(^,7 W ) > d p (X,T jloo) ) > d lJooliP (X) > d n , p (X) 
so that in fact d n , P (X) = d p (X,T l{0 a)) = d\ Jao \ tP (X). 

For any grid F with size at most d, F(X G conv(F)) = so that F x (d£)-a.s., F„ >p (£, T) = dist(£, T) 
owing to the strong continuity of Fx ■ Hence, dual and primal quantization coincide which ensures 
the existence of optimal grids. 

Let n > d + 1. Assume temporarily that any optimal grids at level n, denoted r*'" is "flat" 
i.e. conv(r*'™) has an empty interior or equivalently that the affine subspace spanned by T*' n is 
included in a hypcrplanc H n . Then, owing to the strong continuity assumption and Lemma [U 

d n , p (X) = d p (X,T*- n ) > ||dist(A-,ff n )|| L , > e d -. ltP (X) > 0. 

, * V 

Consequently this inequality fails for large enough n since d n}P {X) — > i.e. conv(r*' n ) ^ for 
large enough n. 

Now assume that (conv(r*' n ) fl supp(Px) C r*' n for an infinite subsequence. Let £0 G M. d and 

£0 > such that B(£ , e ) C supp(Px ). This implies that B(£ ,£o) H conv(r*'"') = 0. 
Then, for every £g B(£ ,e /2), F p (£,T*- n ') = dist(£ T*' n ') > (e /2) so that 

dp(X,r*' n ') > (e /2)P(B(&,eo/2)) > 
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which contradicts the optimality of L*' n at level n 1 at least for n large enough. Consequently for 
every large enough n, 

(conv(r*' n ') \r*' n ') nsupp(p x ) ^ 0. 

Let £ be in this nonempty set. The proof of Theorem ®b) applies at this stage and this shows 
that d n , P (X) is (strictly) decreasing. □ 

5 Numerical computation of optimal dual quantizers 

In order to derive optimal dual quantizers numerically, i.e. by means of gradient based optimiza- 
tion procedures, we have to verify the differentiability of the mapping 

7 ^d n , p (X, 7 ), 1 e(R d ) n 

and derive it first order derivative. 

Therefore, we will need a (dual) non- degeneracy assumption on the Linear Program ^(£,7) to 
establish the existence of the gradient of d^(X, ■ ) a bit like what is needed for e n , P (X, .). 

Definition 7. A grid T 7 = {xi, . . . ,x n } (related to the n-tuple j) is non- degenerate with respect 
to X if, for every l£ Z(T 7 ) and for ¥x(d!;)- almost every £ G Dj n supp(Pjf), it holds 

AjcU < Cjo where u = (Aj ) _1 c/. 

Example. In the Euclidean case (see |15|). this assumption is fulfilled regardless of X, as soon as 
the Delaunay triangulation is intrinsically non-degenerate, i.e. no d+2 points lie on a hypersphere. 
Note it also implies the uniqueness of this Delaunay triangulation. 

Theorem 7. Let Xe L^ d (P), p > 1, such that Fx satisfies the strong continuity assumption. 
Moreover, let 70 = (x±, . . . ,x n ) be an n-tuple in (R d ) n such that supp(Px) C conv(r 7o ). Then: 

(a) The mapping 

7^<p(X, 7 ), 7 e(R d r 

is continuous in 70 . 

(b) If 70 = (xi,...,x n ) is non- degenerate with respect to X and y = (y 1 ,...,y d ) 1— > \\y\\ p is 
differ entiable on R d , then d^(X, ■ ) is differentiate at 70 with partial derivatives 



J^d£(jr >7o )=E 



l<j<d,l<i<n, 



where X(X) and u{X) are the Fx-a.s. unique primal and dual solutions for the Linear Program 

K(x,io)- 

Proof, (a) Owing to Theorem[5ja), it remains to show that d^iX , ■ ) is u.s.c. at 70 = (x\, . . . , x n ). 
Therefore, denote by H l0 the set of all hyperplanes generated by any subset {x^ , . . . , Xi d } of r 7o 
and let jf. = (x\, . . . , a;*) € (R d )™ be a sequence converging to 70 as k — > 00. We will then show 
for every £g supp(P x ) \ H l0 

Hm sup F£(X, 7fc ) 7o). 

Consequently, let £ € supp(Px) \ H l0 and let I E X(r 7o ) be a basis such that £g Dj(T l0 ). Since 
£ ^ Hy i it lies in the interior of convjxj : j e /}, which implies Xj = Aj x b > and 

F%(tlo) = \Jc I . 



24 



Denoting 



1 ... 1 



as k 



U-4\\ p 



we clearly have A k — > A and c k 
Moreover, A k is regular for k large enough, so that (A k )~ x — > AJ 1 a well. But this also implies 
for X k j = (A^b 



A? 



A, 



and A j > for k large enough. 



Therefore, setting X k = 0, jE I c , yields A k X = b so that 

limsup^(C, 7 ,) < lim (A fe ) T c fe = lim (X k ) T c k = Xjcj = F£(£, 7o ). 

Since F(X £ -ff 7o ) = and 70) < +00 by assumption, Fatou's Lemma yields the u.s.c. of 

d*(X, •) in 70. 

(b) Let iV 7o denote the Px-negligible set of points £ on which -^(£,70) is dually degenerate in 
the sense of Definition [71 Moreover let £ £ supp(Px) \ (H-y U iV 7o ). Then the Linear Program 
^n(C>7o) is a ls° non-degenerate in the primal sense since £ ^ -ff 7o lies in the interior of any 
optimal basis I = I* £ I(T 7o ) for the (LP) problem, which means Aj x b > 0. 
Now, owing to Proposition [6l let A and u denote primal and dual solutions for FP(£,7o), i.e. 



(24) 

Aj a u > owing to the 



cj — Aju + Xj 
c IC - Aj c + A/ c 



> 0. 



FP(tlo) = Xjc!=u T b. 

As a consequence C] — Aju + Xj = A/ > whereas o — Aj a u + A/ c 
non-degeneracy assumption since £ ^ iV 7o . Finally 

c - A T u + X = 

Since 

74c - A T u + X 

is continuous at 70, there exists a neighborhood Liijjo) of 70 such that, with obvious notations, 
for every 7' = (x[, . . . , x' n ) £ U ( 7o ) 

c' - (A'fu' + X' > 

"iK-siri 



4'= T 1 > c '= : > A = ((A')7 1 &,0), u' = ((A')T)- 1 ^. 

But this implies by Proposition |B] that £ £ D/(r 7 ) as well (i.e. I is also optimal) for every 
7 '£ U(jo), so that we conclude 

F^, 7 ') = (X' I ) T c' I = (u') T b. 

Therefore we may differentiate the identity (|24[) formally with respect to the grid 70 = (x\ , . . . , x n ) 
where xt = (xj, . . . , xf), i = 1, . . . ,n. In practice, we will compute the partial derivatives with 

respect to x\, i £ /, j £ {1, ...,d}, after noting that ^j- = [Sij] (Kronecker symbol) and 



that the differential of dA 1 on GL(d, R) is given by dA 



-A~ 1 (dA)A~ 1 . Then, still with 



Ai 



. 1 



ci 



\p\. T and b 
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= -xj[s ij }u(0 + Xi(0\\^-^ p 

= AiCooi^-sr-ttiCO) 

which is bounded as a function of £ on any compact set, so that the assertion follows. □ 



5.1 One dimensional setting 

In the one dimensional case, we can derive, due to a simpler geometrical structure, more explicit 
expressions for F%, d^(X, .) and its derivatives. 

To be more precisely, let 7 = (x\, . ■ . , x„) G {(£1, . . . , £„) G M™, £1 < £2 < • • • < Then 

Ar(r T ) = [xi,x i+ i] for I = + 1}, 
so that we arrive at the following formula for the dual quantization error 

<F n (X, 7 ) = J2 — / {(x i+1 -0(t-Xi) v + (t;-Xi)(x i+1 -Zy)F x (d!i). (25) 

i=l X ' l + 1 X ' 1 Jx, 

When supp(Px) is compact, we set I = [a, b] = conv(supp(Px)), we fix the endpoints of the grid 
(following ([25]) though keeping the notation d™) and we consider 7G {(£1, • • • ,£n)G I n , Q> = £1 < 
£2 < • • • < £,n = b}. 

Uniform distribution: For the uniform distribution U([0, 1]) we can even compute the exact 
solutions for the dual quantization problem. Therefore, one easily derives from (|25[) 

2 ™ 1 

dP n (u{[o, i]), 7 ) = {p - i)(p - 2) ^> i+1 - Xl y + \ Xl = 0, x n = l, 

so that setting yi = Xi+\ — i = 1, . . . , n, yields 

2 f" _1 1 

= (P +i) (P +2) m 4y : ^ m = 1>w - °}- 

The solution to this problem is obviously given by yi = ^ry, which implies that the grid 

i — 1 

7* = {x* : 1 < i < n} with x* = -, 1 < i < n, 

is optimal and 

d£(W([0,l])) = 



(p + l)(p + 2) (n-l)f' 



2G 



Recall, see e.g. [JJ, that it holds for ordinary quantization of the uniform distribution 

2i — 1 , „,..., \ . 1 1 



,i = l,...,n, and e p n (U({0, 1])) = — — , 



2n 

so that we conclude for the sharp asymptotics 

hm ^^(^([0,1])) = — — lim n 1 / d e„. p (^([0,l])). 

n— >oo x ' \ p 2 / n— >oo ' 

Furthermore, we recognize that an optimal dual quantizer of size n + 1, namely (^-)i<i< n +i, is 
made up by the (n — 1) midpoints of an optimal regular quantizer of size n plus the two interval 
endpoints. One may even show in this context that such a construction leads to asymptotically 
optimal dual quantizers for any compactly supported distribution in dimension one. 

General quadratic case: In the general quadratic setup, we derive from Theorem [7] for p = 2 
or, more simply in this ID-setting, using directly (|25[) that, for an ordered grid 7 = (rci, . . . , x n ), 

— a(X,7) = / tPxW-Xi-i F x (dt)-x i+1 F x (dt), 2<»<n-l. 

vX% Jxi-i Jxi-i J Xi 

If conv(supp(Px)) = [a, b], following the variant (|23|) . wc statically fix the endpoints x\ = a and 
x n = b in any optimization procedure to generate optimal dual quantizers. 

Otherwise, in the unbounded case, we introduce boundary conditions taking into account "out- 
side" [xi,a; n ] a nearest neighbor rule 

dd p f Xl f X2 



+ OO 



dxi 

|^(X, 7 ) = 2/ (x n -OV x (dO+l (Z-Xn-JFxW). 

OX n J Xn Jx n —i 

The second derivative then reads when F x is absolutely continuous with continuous density 
d 2 dP f Xl dFv 

d 2 Jp„ ,„ , d 2 dZ r* a 



(*>7) 7) = -/ P x (dO 



8x28x1 ' 8x\8x 



2 



Xl 



8 2 dP lv , a 2 rfP , v , . ,(ff x , . „ 

^R 2 "^ = 8ixf iXn) = ( ' Ti+1 " ^dF™' 2< l <n-l, 

8 2 d p 8 2 <P f Xi+1 

-5~(*.7)= 5 a " ( X >-y) = - Px(de), 2<i<n-l, 



«2jp o2jp /-Xi+i 

■(^7)=«-5f-(^7) = -/ Px(dC), 2<i<n-l, 



(X, 7) = 2 / P X (df) + (*„ - In-i ) — f (in) • 



The above integral expressions can be for most distributions evaluated in closed-form. Therefore, 
it is straightforward to implement a Newton method to find a zero of Vd^X, •), which yields an 
optimal dual quantizer. Such a procedure, initialized with an equidistant grid in the center of 
the distribution, converges usually very fast (less than 10 iterations) to an optimal grid. 
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5.2 Multi-dimensional setting 



In the multi-dimensional case, the computation of Vd^(X, •) involves the evaluation of multi- 
dimensional integrals, for which in general no closed-form solution is available and numerical 
evaluation of these integrals is a rather time consuming task. 

We therefore focus, as in the case of regular quantization, on a stochastic gradient optimization 
algorithm (also known as a "Robbins- Monro" zero search procedure for the gradient). Such 
an algorithm has the advantage of building up the necessary gradient information step-by-step 
during the simulation and therefore is by several magnitudes faster than a "batch" -approach 
which evaluates the full gradient at each iteration. 

In the case of regular Voronoi vector quantization, this stochastic algorithm approach is also 
known as Competitive Vector Learning Quantization algorithm (CVLQ) (see 

Algorithm 1 CVLQ for dual Quantization 
Input: 

• Step sequence a k > such that J2 k >o ak = 12 k >o a t < +°° 

• Initial grid 7o e {R d ) n 
Main loop: 

for k = to N - 1 do 

Generate i.i.d. sample X k ~ X 
Set 

7fc+i «- 7fc - a k V lk FP(X k , lk ) 
end for 

To compare this procedure to the regular CVLQ-algorithm, we inspect the main loop for the case 
p = 2. Given a realization X k of X, we only have to replace the Nearest Neighbor search by 
a search for the Delaunay triangle /*, which contains X k . According to Theorem the primal 
solution \j to the Linear Program F%(X k ,j) is then given by the barycentric coordinates of X k 
in the triangle /* and the dual solution can be calculated by the formula 

u* =2(z*-X k ), 

where z* is the center of the hyperspherc spanning the triangle I*. Wc therefore can simplify the 
partial derivative of F%(X k , (x\, . . . , x n )) for I* being the Delaunay triangle containing X k to 

^-FP(X k , (an, ... , x n )) = 2X*( Xi - z*). 



Main loop:: regular CVLQ 



Main loop:: CVLQ for dual quantization 



for k = to N - 1 do 

• Generate i.i.d. sample X k ~ X 

• Find NN index i* of X k in {x\, . 



for j — 1 to n do 
if j = i* then 



3 

else 

^3 

end if 
end for 
end for 



a:^ 1 <- x h - - a k (df 



X k ) 



for k = to N - 1 do 

• Generate i.i.d. sample X k 



Find Delaunay 



X 

triangle I* 



{x\, . . . , cc„}, which contains X k 

• Compute LP solution AJ and center z* 

for j = 1 to n do 



if j 6 / : 
else 

X 3 

end if 
end for 
end for 



then 



■(— Xa 



ah A* {x) - z*) 
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These procedures usually converge quickly to a first approximation of an optimal quantization 
grid. For a local refinement, we propose to combine the above approach with a few quasi-Newton 
steps of a deterministic optimization algorithm, where the evaluation of the integral expression 
is performed by a Monte Carlo or a Quasi Monte Carlo, method (see [IB]). As concerns the 
Uniform distribution on [0, l] 2 below, note that we considered the variant (J23J) of the quadratic 
mean dual quantization error where the four vertices of the unit square are "anchor points" . 

Numerical results obtained from this approach are given for the Uniform distribution on [0, l] 2 
in figures [1] to [2] with grid sizes 8 to 16, for the standard normal distribution on M. 2 for a grid size 
of 250 in figure [3] and for the joint distribution of the standard Brownian motion at time 1 and 
its supremum over the unit interval in figure 2J 

Acknowledgement: The authors thank one of the referees for his extremely careful reading of the 
manuscript. 
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Figure 4: Dual Quantization of the joint distribution a Brownian motion at T = 1 and its 
supremum over [0, 1] (N = 250). 
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Appendix 

The table below provides in a synthetic way the respective main features of both Voronoi and 
Delaunay (dual) quantization. 

Let T = {x-l, . . . , x N } C R d be a grid of size N > 1 and let F : R d ->■ R be a function. 



quantization mode 


iq = vq (Voronoi) 


ig = dq (Delaunay) 




C = £ argmin £ - x fe 


C H9 = J r *(^o,C) with J?(wo,0 = 


X : n -> R d 


X"« = 7T r (X) 


X d "(o;o,a;) = J r *(o;o,X(a;)) 


E(X icl \X = £) 






E(X|X l ? =aj fc ) 


xtc (only if T is L 2 (F X )- optimal) 


X 


E(F(l i9 )|X = ^) 
(funct. approx. op.) 


F(C q ) = (F O n r )(0 
(stepwise constant) 

«ne) + [^u P diBt(e,r) 


J?(F)(^):=E Po (F(J?(a;o,0))= E A U0^) 

z fc eT(«) 

(Lipschitz & stepwise affine on conv(r)) 
^F(0 + [DF]^E Po {U d "-a 2 ) 


E(F(X)\X*« =x k ) 








only if F is L 2 {V X )-optimal 


X 



In particular, this table shows that both quantizations methods are connected with a functional 
approximation operator. 

- Voronoi quantization with a projection operator (f H- f oip) on stepwise constant functions 

- Delaunay quantization with an interpolation operator (F t— > FoJJp) on stepwise affine functions. 

These two operators are intrinsic in the sense that they do not depend on the distribution of the 
random vector X. 
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